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Abstract. We review and improve a recently introduced method for the detection of communities 
q ■ in complex networks. This method combines spectral properties of some matrices encoding the 

network topology, with well known hierarchical clustering techniques, and the use of the modularity 
c/3 , parameter to quantify the goodness of any possible community subdivision. This provides one of 

O ■ the best available methods for the detection of community structures in complex systems. 

Complex networks have recently been an active topic of investigation in physics 
because of their relevance in the modeling of many real complex systems ranging from 
social and communication networks to biology and neural sciences [1]. A common 
, feature of many of these real networks is the presence of communities, that is subsets of 

nodes with high mutual interconnectivity and only few links to the rest of the network. 

The importance of their proper detection stems from many different causes: first of 
all they provide a coarse-grained structure that can notoriously simplify the analysis of a 
large network. Moreover, communities can be identified as functional units in several 
cases of biochemical or neural networks. Therefore, even if there is no commonly 
accepted quantitative definition of community, many algorithms have been proposed 
to split a network into densely interconnected subsets [2, 3, 4]. A recent comparative 
review of most of the available community finding methods can be found in [5]. 

For other problems, similar in spirit to this one, as for example graph partitioning (in a 
given number of subsets), image partitioning, or graph visualization, spectral techniques 
have proven to be very useful [6]. Such methods are based on the spectral analysis of 
a suitable matrix encoding the corresponding network topology. Similar techniques can 
also be exploited for the detection of communities [7, 8]. 

Here we give a brief outline of a method we recently introduced [8] which com- 
bines spectral properties, hierarchical clustering techniques, and the optimization of the 
modularity (a quantity introduced to quantify the validity of any given community sub- 
division) [9]. 

The nodes of a given network are represented as points in a D-dimensional space 
whose coordinates are the components of the first D non-trivial eigenvectors of the cor- 
responding Laplacian matrix [11]. Once the nodes have been embedded in a space, a 
distance (Euclidean, angular, etc [8]) between them can be defined. Afterwards, stan- 
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dard methods such as hierarchical clustering techniques [10, 8] are employed to group 
the nodes according to their mutual distances: nearby sites are progressively grouped 
together. Proceeding like this, a dendrogram, that is a tree representing the hierarchy, is 
obtained. In order to determine at which level the "tree" should be looked at to obtain 
the best community-splitting, we have to quantify the quality of the partitions. For this 
purpose, the modularity Q, defined as the fraction of internal edges minus its expected 
value for a random graph with the same number of links for each community, has been 
introduced. The output of the algorithm is therefore the partition of the dendrogram 
giving the highest value of Q. 

The justification for using the eigenvectors of the Laplacian matrix representing 
the network, can be understood by exploiting the connection between the eigenvalue 
problem and the minimization of the quadratic form 

£ (xi-xj) 2 = x T Lx, (1) 

links 

where the x = is a vector of real values assigned to the nodes and L is the Laplacian 
matrix [11]. Minimizing this expression is a way to impose the condition that connected 
nodes should be given a similar value of x. Indeed, it is easy to see that minimizing 
equation (1) with a normalization condition on vector x = 1) yields the eigenvalue 
equation for matrix L. The first eigenvector is trivial (constant) and the corresponding 
eigenvalue is zero: actually if all xi are equal the sum (1) is zero and it is its minimum 
possible value. The following eigenvector (with an eigenvalue larger than for any con- 
nected network) corresponds to the non-trivial minimum and therefore its components 
can be used to partition the nodes. Indeed, as shown in [8], also the following eigenvec- 
tors contain useful information and can be profitably used to find communities in the 
network. The number of eigenvectors D that have to be taken into account in order to 
obtain a good detection of communities is a priori not known. Therefore, the whole pro- 
cedure is repeated in the algorithm for different D's and the subdivision corresponding 
to the highest value of the modularity is selected. 

If we assign to the nodes a weight proportional to their degree, the normalization 
condition becomes £fcpc? = x T T)x = 1; in this case the minimization of equation (1) 
is transformed into the eigenvalue equation for the matrix L' = D *L. As before, the 
first non trivial eigenvector corresponds to the non-trivial minimum of the sum (1). 
Therefore, we can wonder how the original method performance (as presented in [8]) 
is affected by replacing the eigenvectors of L by those of L/. 

First of all, we applied both algorithms (with L and with \J respectively) 1 to computer 
generated networks with a given community structure [2]. These networks contain 128 
nodes, split into 4 equal-size communities; edges are randomly extracted in such a way 
that each node has, on the average, k m links to other nodes in the same community and 
k out to to the rest of the network, with k m + fc ou t = 16. For small k out the communities are 
almost disconnected, while increasing this value they become less and less separated, so 
that detecting them becomes a very difficult and not clearcut task. Since the communities 
are known, we can measure the quality of the algorithm by counting the number of nodes 



An implementation of the algorithms can be found at http : / /www . ugr . es/~donetti/. 




FIGURE 1. Fraction of nodes correctly classified by the algorithm (averaged over 200 networks) as a 
function of k out , using the eigenvectors of L and L'. In both cases angular distance and complete linkage 
clustering are used (see [8]). 



that are correctly classified. In figure 1 we plot the corresponding fraction of nodes, and 
we can see that when the eigenvectors of the normalized Laplacian matrix L' are used, 
the method produces much better results. Moreover, in a very recent independent and 
systematic comparison of different community-finding methods performed by Danon 
et al. [5], it has been found that our method, equipped with the normalized Laplacian 
matrix, exhibits an extremely good performance and is among the most convenient 
choices. 

Another network which is used as a test for many community finding algorithm 
is the Zachary karate club [12]. In this case, we can compare the modularity value 
corresponding to the best split in the two cases: using the Laplacian eigenvectors we 
obtain Q = 0.412 while using the eigenvectors of L' leads to Q = 0.419 which is the 
best value obtained so far for such a workbench problem [4]. 

As a last example, we have studied the jazz bands network [13], which is also one 
of the prototypical instances studied in this field. Using the Laplacian we measure 
Q = 0.437, while with \J the modularity increases to Q = 0.444 (almost identical to 
the best available result [4]). 

Summarizing, we outlined the connection between the detection of communities 
and the spectral properties of some proper matrices describing the network topology. 
Moreover, we improved the performance of the algorithm described in [8] by using the 
eigenvectors of a different matrix: the normalized Laplacian matrix. We do not have a 
clear understanding of why the method equipped with this matrix gives better results 
than with the Laplacian matrix, but as a matter of fact this is actually the case in all 



the tested examples. Finally, let us mention that the method (with either matrix) can be 
easily generalized to the case of weighted networks. 
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